Reformulating the history matching problem from a least-square mathematical optimization problem into a Markov Decision Process introduces a method in which reinforcement learning can be utilized to solve the problem. This method provides a mechanism where an artificial deep neural network agent can interact with the reservoir simulator and find multiple different solutions to the problem. Such formulation allows for solving the problem in parallel by launching multiple concurrent environments enabling the agent to learn simultaneously from all the environments at once, achieving significant speed up.
translated by 谷歌翻译
评估网络协议的真实表现是具有挑战性的。随机控制试验(RCT)对大多数研究人员来说是昂贵的并且无法进入,而专业设计的模拟器则无法捕获真实网络中的复杂行为。我们呈现MaunAlim,一种数据驱动的模拟器,用于解决这一挑战的网络协议。由于数据收集期间使用的协议引入的偏差,从观察数据中学习网络行为是复杂的。 MakAlAIM在一组协议下使用来自初始RCT的迹线来学习因果网络模型,有效地去除数据中存在的偏差。然后,使用此模型,可以在同一迹线上模拟任何协议(即,用于反事实预测)。因果的关键是对来自来自RCT的训练数据引起的分布修正因的对抗性神经网络培训进行了新的使用。我们对实际和合成数据集的MAURALAIM的广泛评估以及来自河豚视频流系统的两种用例,包括来自河豚视频流系统的超过九个月的实际数据,表明它提供了准确的反事预测,将预测误差降低了44%和53%平均值与专家设计和标准的监督学习基线相比。
translated by 谷歌翻译
我们考虑在严重数据稀缺下具有异质代理的离线强化学习(RL),即,我们只观察一个未知潜在的次优政策下的每个代理的单一历史轨迹。我们发现,即使对于常见的“解决”基准设置(如“Makescar”和“Cartpole”),我们发现最先进的离线和基于模型的RL方法的性能显着降低了显着的数据可用性。为了解决这一挑战,我们提出了一种基于模型的离线RL方法,该方法首先通过在学习政策之前共同使用所有代理商的历史轨迹来学习每个代理的个性化模拟器。我们这样做是这样做的,指出代理商的过渡动态可以表示为与代理商,州和行动相关的潜在因子的潜在函数;随后,理论上,理论上建立了这种函数通过可分离代理,状态和动作潜在函数的“低级”分解良好地近似。此表示表明,一个简单的正则化的神经网络架构,以有效地学习每个代理的过渡动态,即使具有稀缺,离线数据。我们在多个基准环境和RL方法中执行大量实验。我们的方法的一致性提高,在国家动态预测和最终奖励方面衡量,确认了我们框架在利用有限的历史数据方面的效力,以同时学习跨代理商的个性化政策。
translated by 谷歌翻译
我们介绍和分析了多元奇异频谱分析(MSSA)的变体,这是一种流行的时间序列方法,用于启用和预测多元时间序列。在我们介绍的时空因素模型下,给定$ n $时间序列和$ t $观测时间序列,我们为插补和样本外预测均有效地扩展为$ 1 / \ sqrt,为预测和样本预测有效地缩放均值{\ min(n,t)t} $。这是一个改进:(i)$ 1 /\ sqrt {t} $ SSA的错误缩放,MSSA限制对单变量时间序列; (ii)$ 1/\ min(n,t)$对于不利用数据中时间结构的矩阵估计方法的错误缩放。我们引入的时空模型包括:谐波,多项式,可区分的周期函数和持有人连续函数的任何有限总和和产物。在时空因素模型下,我们的样本外预测结果可能对在线学习具有独立的兴趣。从经验上讲,在基准数据集上,我们的MSSA变体通过最先进的神经网络时间序列方法(例如,DEEPAR,LSTM)竞争性能,并且明显优于诸如矢量自动化(VAR)之类的经典方法。最后,我们提出了MSSA的扩展:(i)估计时间序列的时变差异的变体; (ii)一种张量变体,对于$ n $和$ t $的某些制度具有更好的样本复杂性。
translated by 谷歌翻译
Multi-view projection techniques have shown themselves to be highly effective in achieving top-performing results in the recognition of 3D shapes. These methods involve learning how to combine information from multiple view-points. However, the camera view-points from which these views are obtained are often fixed for all shapes. To overcome the static nature of current multi-view techniques, we propose learning these view-points. Specifically, we introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition. As a result, MVTN can be trained end-to-end with any multi-view network for 3D shape classification. We integrate MVTN into a novel adaptive multi-view pipeline that is capable of rendering both 3D meshes and point clouds. Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks (ModelNet40, ScanObjectNN, ShapeNet Core55). Further analysis indicates that our approach exhibits improved robustness to occlusion compared to other methods. We also investigate additional aspects of MVTN, such as 2D pretraining and its use for segmentation. To support further research in this area, we have released MVTorch, a PyTorch library for 3D understanding and generation using multi-view projections.
translated by 谷歌翻译
Recent advances in Neural Radiance Fields (NeRFs) treat the problem of novel view synthesis as Sparse Radiance Field (SRF) optimization using sparse voxels for efficient and fast rendering (plenoxels,InstantNGP). In order to leverage machine learning and adoption of SRFs as a 3D representation, we present SPARF, a large-scale ShapeNet-based synthetic dataset for novel view synthesis consisting of $\sim$ 17 million images rendered from nearly 40,000 shapes at high resolution (400 X 400 pixels). The dataset is orders of magnitude larger than existing synthetic datasets for novel view synthesis and includes more than one million 3D-optimized radiance fields with multiple voxel resolutions. Furthermore, we propose a novel pipeline (SuRFNet) that learns to generate sparse voxel radiance fields from only few views. This is done by using the densely collected SPARF dataset and 3D sparse convolutions. SuRFNet employs partial SRFs from few/one images and a specialized SRF loss to learn to generate high-quality sparse voxel radiance fields that can be rendered from novel views. Our approach achieves state-of-the-art results in the task of unconstrained novel view synthesis based on few views on ShapeNet as compared to recent baselines. The SPARF dataset will be made public with the code and models on the project website https://abdullahamdi.com/sparf/ .
translated by 谷歌翻译
Handwriting Recognition has been a field of great interest in the Artificial Intelligence domain. Due to its broad use cases in real life, research has been conducted widely on it. Prominent work has been done in this field focusing mainly on Latin characters. However, the domain of Arabic handwritten character recognition is still relatively unexplored. The inherent cursive nature of the Arabic characters and variations in writing styles across individuals makes the task even more challenging. We identified some probable reasons behind this and proposed a lightweight Convolutional Neural Network-based architecture for recognizing Arabic characters and digits. The proposed pipeline consists of a total of 18 layers containing four layers each for convolution, pooling, batch normalization, dropout, and finally one Global average pooling and a Dense layer. Furthermore, we thoroughly investigated the different choices of hyperparameters such as the choice of the optimizer, kernel initializer, activation function, etc. Evaluating the proposed architecture on the publicly available 'Arabic Handwritten Character Dataset (AHCD)' and 'Modified Arabic handwritten digits Database (MadBase)' datasets, the proposed model respectively achieved an accuracy of 96.93% and 99.35% which is comparable to the state-of-the-art and makes it a suitable solution for real-life end-level applications.
translated by 谷歌翻译
Ultrasound is progressing toward becoming an affordable and versatile solution to medical imaging. With the advent of COVID-19 global pandemic, there is a need to fully automate ultrasound imaging as it requires trained operators in close proximity to patients for long period of time. In this work, we investigate the important yet seldom-studied problem of scan target localization, under the setting of lung ultrasound imaging. We propose a purely vision-based, data driven method that incorporates learning-based computer vision techniques. We combine a human pose estimation model with a specially designed regression model to predict the lung ultrasound scan targets, and deploy multiview stereo vision to enhance the consistency of 3D target localization. While related works mostly focus on phantom experiments, we collect data from 30 human subjects for testing. Our method attains an accuracy level of 15.52 (9.47) mm for probe positioning and 4.32 (3.69){\deg} for probe orientation, with a success rate above 80% under an error threshold of 25mm for all scan targets. Moreover, our approach can serve as a general solution to other types of ultrasound modalities. The code for implementation has been released.
translated by 谷歌翻译
With the recent advances in video and 3D understanding, novel 4D spatio-temporal challenges fusing both concepts have emerged. Towards this direction, the Ego4D Episodic Memory Benchmark proposed a task for Visual Queries with 3D Localization (VQ3D). Given an egocentric video clip and an image crop depicting a query object, the goal is to localize the 3D position of the center of that query object with respect to the camera pose of a query frame. Current methods tackle the problem of VQ3D by lifting the 2D localization results of the sister task Visual Queries with 2D Localization (VQ2D) into a 3D reconstruction. Yet, we point out that the low number of Queries with Poses (QwP) from previous VQ3D methods severally hinders their overall success rate and highlights the need for further effort in 3D modeling to tackle the VQ3D task. In this work, we formalize a pipeline that better entangles 3D multiview geometry with 2D object retrieval from egocentric videos. We estimate more robust camera poses, leading to more successful object queries and substantially improved VQ3D performance. In practice, our method reaches a top-1 overall success rate of 86.36% on the Ego4D Episodic Memory Benchmark VQ3D, a 10x improvement over the previous state-of-the-art. In addition, we provide a complete empirical study highlighting the remaining challenges in VQ3D.
translated by 谷歌翻译
Via operator theoretic methods, we formalize the concentration phenomenon for a given observable `$r$' of a discrete time Markov chain with `$\mu_{\pi}$' as invariant ergodic measure, possibly having support on an unbounded state space. The main contribution of this paper is circumventing tedious probabilistic methods with a study of a composition of the Markov transition operator $P$ followed by a multiplication operator defined by $e^{r}$. It turns out that even if the observable/ reward function is unbounded, but for some for some $q>2$, $\|e^{r}\|_{q \rightarrow 2} \propto \exp\big(\mu_{\pi}(r) +\frac{2q}{q-2}\big) $ and $P$ is hyperbounded with norm control $\|P\|_{2 \rightarrow q }< e^{\frac{1}{2}[\frac{1}{2}-\frac{1}{q}]}$, sharp non-asymptotic concentration bounds follow. \emph{Transport-entropy} inequality ensures the aforementioned upper bound on multiplication operator for all $q>2$. The role of \emph{reversibility} in concentration phenomenon is demystified. These results are particularly useful for the reinforcement learning and controls communities as they allow for concentration inequalities w.r.t standard unbounded obersvables/reward functions where exact knowledge of the system is not available, let alone the reversibility of stationary measure.
translated by 谷歌翻译